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Abstract 

In many systems, the time scales of the microscopic dynamics and macroscopic dynamics 
of interest are separated by many orders of magnitude. Examples abound, for instance nucle- 
ation, protein folding, and chemical reactions. For these systems, direct simulation of phase 
space trajectories does not efficiently determine most physical quantities of interest. The last 
decade has seen the advent of methods circumventing brute force simulation. For most dy- 
namical quantities, these methods all share the drawback of systematical errors. We present a 
novel method for generating ensembles of phase space trajectories. By sampling small pieces 
of these trajectories in different phase space domains and piecing them together in a smart 
way using equilibrium properties, we obtain physical quantities such as transition times. This 
method does not have any systematic error and is very efficient; the computational effort to 
calculate the first passage time across a free energy barrier does not increase with the height of 
the barrier. The strength of the method is shown in the Ising model. Accurate measurements 
of nucleation times span almost ten orders of magnitude and reveal corrections to classical 
nucleation theory. 

1 Introduction 

The average time it takes a protein to fold, an undercooled liquid to crystallize, or a chemically 
active molecule to react, can in principle be obtained from brute force computer simulations, by 
simply starting several times in the unfolded, liquid or pre-reaction state and then integrating 
the dynamical equations in time until folding, crystallization or reaction takes place. However, 
the typical time scales of the microscopic dynamics and macroscopic dynamics of interest are 
often separated by many orders of magnitude; for such systems, direct simulation of relevant 
phase space trajectories is very inefficient, if at all possible. In the last decade, methods have 
been developed that sample transition pathways while circumventing brute force simulation, such 
as transition path sampling pQ, transition interface sampling [2 and milestoning [3], but most 
dynamical quantities, e.g. the average transition time, are not provided by those methods or are 
provided with systematic errors. We present a method to determine such dynamical quantities, 
free from systematic errors and very efficient. Our method is generally applicable to systems with 
known equilibrium properties, consisting of two regions with locally stable states, separated from 
each other by a barrier, that may be very high. 
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2 Method 



2.1 Problem 

A typical question that can be addressed by our method is the following: if a system is in equilib- 
rium in a region A of the phase space, what is the average time of first arrival in another region B? 
One should typically think of A and B as attracting basins in phase space, separated by a barrier. 
Examples of structures where such phenomena are found include nucleation, protein folding and 
chemical reactions. The simplest approach to answer such questions is by direct simulation: the 
system is started in A and evolved in time until B is reached, and this procedure is repeated 
many times to collect statistics. If, however, after leaving A returns to it are much more frequent 
than traversals to B, this direct simulation approach becomes very inefficient, since most of the 
computational effort is invested in dynamical trajectories from A back to A, rather than to B. 
Our method distributes the computational effort more efficiently, spending more time on actual 
traversals. 

2.2 Sampling subtrajectories 

The main idea behind our method is to sample different relatively small pieces of the phase space 
trajectories, which we call subtrajectories, and combine them with an appropriate weighting into 
complete trajectories. To classify the different subtrajectories, one should identify a slice M in 
phase space, such that every path connecting A and B has to pass through it. In the case of 
nucleation, for instance, the regions A, B and M could be the set of states in which the size of 
the largest nucleus is smaller than half, larger than twice, or equal to the critical nucleus size (or 
a good estimate of this). A long simulation trajectory can then be divided into subtrajectories, 
which are classified according to (1) their initial and final state (A or B) where the system resides 
at least the correlation time rJ3, (2) whether M has been crossed (denoted by an 'M' between the 
initial and final state), and (3) whether the path crosses the other region without residing there 
longer than t c continuously or not (denoted by an 'x' or 'o' respectively). All these different types 
of subtrajectories are illustrated in figure [TJ The figure also shows A- and B-subtrajectories that 
account for the time spent in A or B in excess of r c . 

The different types of subtrajectories are generated by performing three different simulations. 
First, by starting in A, staying there for a time r c and evolving from then, ignoring the paths 
that go through M, the A- and AA-subtrajectories are sampled. Next, analogous simulations 
around B are performed to find the B- and BB-subtrajectories. Finally, simulations starting in 
M are performed to generate the subtrajectories through M. These subtrajectories through M are 
sampled by starting in M and evolving both forward and backward in time until a correlation time 
t c is spent in A or B continuously at both ends of the path. As shown in [H [5] , the starting points 
should be chosen from the set of points of first arrival in M from either A or B to sample the 
trajectories with the correct frequency. Because of time reversal symmetry, this can be achieved 
by sampling points on M from its equilibrium distribution, generating the trajectory by going 
forward and backward in time, and ignoring all paths that encounter M again on its backward 
part. 

2.3 Recombination of subtrajectories 

Now that the different types of subtrajectories are sampled, they should be recombined to generate 
complete phase space trajectories. These are concatinations of the subtrajectories, where the ones 
of the types AA, AMAo, AMAx, AMB, BB, BMBo, BMBx and BMA are intertwined with A- and 
B-subtrajectories. Recombination is based on the notion that in the stable regions in phase space 

1 The correlation time can be different for the regions A and B, but for simplicity we take a single value for t c . 
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Figure 1: Division of a trajectory in subtrajectories. A cut is made in the trajectory every time 
that the system resides in region A or B longer than some time r c , which is the time in which 
the system loses its memory of when, how and where it entered. Random recombinations of these 
subtrajectories are equally valid trajectories. 

A and B the system will mostly wander for a long time. If this time exceeds some value r c the 
system has lost memory of where it entered. Since after this time there is no correlation between 
the entrance and exit point of region A or B, any random recombination of these subtrajectories 
would constitute a valid trajectory. The subtrajectories of different simulations can be recombincd 
if given the proper weights; that is why the method is efficient. In a straightforward simulation the 
subtrajectories crossing M typically are extremely rare, which strongly reduces statistical accuracy, 
but we generate additional subtrajectories through M by starting there and these subtrajectories 
can be combined with the AA- and BB-subtrajectories to sample long trajectories. 

The weights for the different sets of subtrajectories can be determined from the condition that in 
the resulting long trajectories the system must be found in A, B or M with the correct equilibrium 
probabilities, p^ A \ p^ B ' and p( M \ We assume these are known, either analytically or from other 
kinds of simulations, for instance parallel tempering [6] , methods involving umbrella sampling OH] , 
cluster algorithms [10] or the Wang-Landau method [IT]. For each subtrajectory, we measure 
the time it spends in regions A, B and M. From these measurements we determine for each class C 
of subtrajectories (where C can be AA, AMAo, AMAx, AMB, BB, BMBo, BMBx and BMA) the 
average time spent in each region X (where X can be A, B or M) and these average times are also 

(X) 

determined for the class M of all subtrajectories crossing M; these are the quantities . The 
notation Tc without superscript will denote the average total time spent on one subtrajectory of 
class C. Since these different subtrajectories are always preceded and succeeded by either an A- 

(X) 

or B-subtrajectory, the time these latter subtrajectories take is also included in the times Tq . 
The frequency with which a long trajectory enters subtrajectories of class C is called nc- With 
these definitions, it follows immediately that the probabilities of being in the three regions satisfy 
the equalities 
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From these the subtrajectory frequencies may be expressed as: 



n A A = (p W -n M T^ ) )/Ti A 2, (2a) 
n BB = (p (B) -n M T™)/Tg\ (2b) 
n M =p^/T^\ (2c) 



During the simulations through M, the number of times N c that the specific classes of subtrajec- 
tories through M are encountered, are counted. These lead to the following relationships: 

N c 

nc = ~Tr n>M, (3) 
iv M 

and these determine the remaining frequencies. 

From the frequencies of the different subtrajectories we immediately obtain the probabilities for 
subtrajectories leaving A or B to be of specific type: 



Paa = tiaa/Na, (4a) 

PamAo = "amAo/A/a, (4b) 

PAMAx = "AM Ax /A/a, (4c) 

Pamb = "amb/A/a, (4d) 

with A/a = iiaa + fiAMAo + fiAMAx + RAMB and the analogous relations for the paths starting 
in B. Knowing these probabilities we can randomly recombine subtrajectories with the proper 
weights to generate complete trajectories. 



2.4 Determining transition times 

Depending on the exact quantity of interest, often the explicit recombination process can be 
skipped and replaced by a combination of appropriate averages over the subtrajectories. Here we 
specifically want to address the average transition time from A to B. Note that a traversal to B 
has to end with either an AMB- or AMAx-subtrajectory. To calculate the average transition time, 
we need to know the probabilities of finishing with an AMB- or AMAx-subtrajectory, the average 
numbers of times the AA- and AMAo-subtrajectories are traversed before this happens and the 
average times these subtrajectories take. This results in the following equation for the transition 
time: 

T * _ PAaTaA + PAMAo^AMAo + PAMAx^AMAx + P AM B ^AMB /r\ 

j a^b — : • W 

PAMAx + PAMB 

The labels "first" are added to TamAx and Tamb, since the first time that region B is reached is 
relevant for the transition time, instead of the total time of the subtrajectory. These can also be 
measured during the simulations. We added an asterisk, to distinguish these times from the first 
arrival time of B for a system starting Boltzmann distributed in A; these times are the first arrival 
times of B for a system starting in A with another distribution, namely the distribution of points 
after being in A for a time t c . To obtain the real first arrival time Ta^b, we must perform final 
simulations that start Boltzmann distributed in A until either r c time is spent in A, or arrival 
in B occurs. We call the average time until a time r c in A is spent T A *f; r A , and the average time 
until arrival in B occurs T^j^; these events happen with the probabilities Pa" A an d Pa^b- The 
transition time from A to B, starting Boltzmann distributed in A, is then 

start /restart , rr\* \ , start restart ( o\ 

±A-,B = PA^A {-L A^A + 1 A^Bj + PA-.B 1 A-.B- W 
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2.5 Overview 



Since the determination of the transition time from A to B involves multiple simulations in which 
a lot of quantities are measured, this section provides an overview of the different simulations 
including all measured quantities; these are presented in table [TJ With these quantities measured, 
equations (O - © yield the average time of first arrival in B for a system starting Boltmann dis- 
tributed in A; other dynamical quantities can be obtained as well from different related equations. 



Simulations starting with a time t c in A 


Taa 

T (A) 
AA 


The total time of an A-trajectory 
The total time of an AA-trajectory 
The time an AA-trajectory spends in A 

Simulations starting with a time t c in B 


T B 
Tbb 

T (B) 
BB 


The total time of a B-trajectory 
The total time of a BB-trajectory 
The time a BB-trajectory spends in B 

Simulations starting in M 


Tm 

_(A) „(B) rp(M) 

o^first o^first 
AM Ax' AMB 

N M 

AamAx, AamAoj Aamb 
AbmBx, AbmBo, Abma 


The total time of an M-trajectory 

The time an M-trajectory spends in A, B or M, respectively 
The first time that an AMAx- or AMB-trajectory arrives in B 
The total number of M-trajectories 

The number of AMAx-, AMAo- and AMB-trajectories, respectively 
The number of BMBx-, BMBo- and BMA-trajectories, respectively 


Simulations starting Boltzmann distributed in A 


o^start 

J A^A 

T^start 

J A^B 
„start ^.start 
PA^A' -FA^B 


The time until a time r c is spent in A continuously 

The time to reach B without spending a time t c in A continuously 

The probabilities of starting with A^A and A^B 



Table 1: The measured quantities in the different simulations 



3 Simple toy model 

As a first test we applied this method to a system consisting of a 10 x 10-lattice with a potential 
energy assigned to each sit^l- The dynamics consist of jumps of a single particle to neighboring 
sites with Metropolis p~2] jump rates. "Regions" A and B are two opposing corners of the lattice, 
which are minima of the potential. M is the diagonal in between, which forms a ridge in the 
potential landscape. The simulations are performed for different values of the temperature. Their 
results are shown in figure^ together with the results of brute force simulations. Both simulations 
lasted for one minute of CPU time. Also plotted is the exact transition time that is calculated by 
solving a set of linear equations. 

The results are as expected: both our method and the brute force method sample the transition 
time without any systematic error. However, with our method the statistical error is constant 
as a function of the transition time, while the statistical errors of brute force simulation increase 
proportional to the inverse square root of the transition time, as the law of large numbers dictates 
(see the inset of figure [2]). 

2 The potential energy of the lattice sites is given by the function V = {{x + 2y)(x + 2y- 2.6)) 2 + (x - y - 0.1) 2 
evaluated in the points x, y = . . . 0.9 with steps of 0.1. The sites (0, 0) and (0.9, 0.9) form potential minima, while 
the diagonal in between is around the potential maximum. 
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Figure 2: Results of our method applied to the toy model. Shown are the results of our method, 
compared to brute force simulations and an analytic result. Error bars of our method are omitted 
since they are very small. The inset shows the relative error of both our method and brute force 
simulation. 



4 Ising model 

To show that the method does not only work efficiently in low-dimensional toy systems, we apply 
it to determine the nucleation time of a 64 x 64 Ising model with spin-flip dynamics and Metropolis 
acceptance probabilities, for a large range of parameter values for ft J and j3h, i.e. the coupling 
constant and the external magnetization in units of k^T. 

Details of the simulations are as follows: we start in a metastable state consisting of spins an- 
tiparallel to the external magnetization. The coordinate used to characterize the regions A, B and 
M is the number n.4 of spins that are parallel to the external magnetization, and that have four 
neighbours which are also parallel to it. This turns out to be a much better reaction coordinate 
than for instance the size of the largest cluster of parallel spins. The convenient property of 714 
is that it can only change by a maximum of five; if the size of the largest cluster is taken as a 
reaction coordinate, it can undergo large changes due to the merging or splitting of clusters. To 
obtain the free energy as a function of n.4 we use successive umbrella sampling |13j : by restricting 
n.4 to either i or i + 1, the free energy difference between n^ = i and 714 = i + 1 is determined; this 
process is repeated for increasing values of i, until the free energy (as a function of 714) returns to 
the value at — 0. A typical result of this free energy sampling is shown in figure [3l 

Next, the regions A, B and M are characterized in terms of 714. The region M is chosen at the top 
of the barrier and has width five, so that every nucleation trajectory intersects it. The regions A 
and B are such that the free energy barriers to its boundaries are 5 k^T. The motivation behind 
this is as follows: the barriers should be high enough that the system will linger in A and B long 
enough to lose correlation, but on the other hand low enough that it can be crossed by thermal 
activation to sample AA- and BB-trajectories; 5 k-gT seems a reasonable choice for that. We will 
call region A the pre-nucleation state and B the nucleated state, since when the system arrives 
in B, it is extremely likely to continue to a stable state in which most spins are aligned with the 
external field. The regions A, B and M are also indicated in figure [3l With the regions A, B and 
M defined and the probabilities of being there known, the method can be applied to determine 
the nucleation times. In our simulations, we took a correlation time r c of 100 attemped spin 
flips per site. Equilibration inside region A and B is fast, and the system is certainly statistically 
uncorrelated within this time. We also verified this in simulations with r c = 200 and 500. 

The resulting nucleation times are presented in figure |4j Note that the nucleation times span 10 
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Figure 3: Free energy of the two-dimensional Ising model with 64 x 64 sites, a coupling constant 
of j3J = 0.60 and an external field of (3h — 0.06, as function of the number 77,4 of spins parallel to 
the external field which have four aligned neighbors. Also the pre-nucleation region A, the barrier 
region M and the post-nucleation region B are indicated. 

orders of magnitude with constant relative statistical errors. For comparison, results of brute force 
simulations (if possible) and classical nucleation theorjH are also shown. The general trend is well 
captured by CNT. However, as shown in the insets, our method is accurate enough to reveal the 
shortcomings of CNT. 

The computational effort in this calculation of the average nucleation time is approximately 13 
hours of CPU time on an AMD-64 single-processor workstation for each set of temperature and 
field strength; one hour is spent for the determination of the free energy as a function of cluster 
size and 12 hours for the generation of the various subtrajectories and the nucleation time. An 
equal amount of computational effort was invested in the brute-force computations. 

We would like to thank Henk van Beijeren for useful discussion. 
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